Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.
translated by 谷歌翻译
In this paper, we present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions, building on the methodology of neural radiance fields (NeRF). The rendering procedure of NeRF-based methods typically relies on a pixel wise manner in which rays (or pixels) are treated independently on both training and inference phases, limiting its representational ability on describing subtle details especially when lifting to a extremely high resolution. We address the issue by better exploring ray correlation for enhancing high-frequency details benefiting from the use of geometry-aware local context. Particularly, we use the view-consistent encoder to model geometric information effectively in a lower resolution space and recover fine details through the view-consistent decoder, conditioned on ray features and depths estimated by the encoder. Joint training with patch-based sampling further facilitates our method incorporating the supervision from perception oriented regularization beyond pixel wise loss. Quantitative and qualitative comparisons with modern NeRF methods demonstrate that our method can significantly boost rendering quality for retaining high-frequency details, achieving the state-of-the-art visual quality on 4K ultra-high-resolution scenario. Code Available at \url{https://github.com/frozoul/4K-NeRF}
translated by 谷歌翻译
Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this issue in this paper by introducing a simple yet effective framework, called vector quantized radiance fields (VQRF), for compressing these volume-grid-based radiance fields. We first present a robust and adaptive metric for estimating redundancy in grid models and performing voxel pruning by better exploring intermediate outputs of volumetric rendering. A trainable vector quantization is further proposed to improve the compactness of grid models. In combination with an efficient joint tuning strategy and post-processing, our method can achieve a compression ratio of 100$\times$ by reducing the overall model size to 1 MB with negligible loss on visual quality. Extensive experiments demonstrate that the proposed framework is capable of achieving unrivaled performance and well generalization across multiple methods with distinct volumetric structures, facilitating the wide use of volumetric radiance fields methods in real-world applications. Code Available at \url{https://github.com/AlgoHunt/VQRF}
translated by 谷歌翻译
translated by 谷歌翻译
代码生成旨在从自然语言描述中自动生成代码段。通常,主流代码生成方法依赖大量的配对培训数据,包括自然语言描述和代码。但是,在某些特定领域的情况下,很难为代码生成建立如此大的配对语料库,因为没有直接可用的配对数据,并且需要大量精力来手动编写代码说明来构建高质量的培训数据集。由于培训数据有限,生成模型不能经过良好的训练,并且可能过于拟合,从而使该模型对现实世界的使用不满意。为此,在本文中,我们提出了一种任务增强方法,该方法通过扩展原始的Tranx模型来支持suptoken级代码生成,将域知识通过辅助任务和亚键入tranx模型纳入代码生成模型。为了验证我们提出的方法,我们收集了一个真实的代码生成数据集并在其上进行实验。我们的实验结果表明,亚句级Tranx模型在我们的数据集中优于原始Tranx模型和变压器模型,并且在我们的任务增强方法的帮助下,Subtoken-Tranx的确切匹配精度可显着提高12.75 \%。多个代码类别的模型性能满足了工业系统应用程序的要求。我们提出的方法已由阿里巴巴的\ emph {bizcook}平台采用。据我们所知,这是在工业开发环境中采用的第一个领域代码生成系统。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
视觉地位识别是自主驾驶导航和移动机器人定位等应用的具有挑战性的任务。分散注意力在复杂的场景中呈现的元素经常导致视觉场所的感知偏差。为了解决这个问题,必须将信息与任务相关区域中的信息集成到图像表示中至关重要。在本文中,我们介绍了一种基于视觉变压器的新型整体地点识别模型,TransVPR。它受益于变形金刚的自我关注操作的理想性能,这可以自然地聚合任务相关的特征。从多个级别的变压器的关注,重点关注不同的感兴趣区域,以产生全球图像表示。另外,由熔融注意掩模过滤的变压器层的输出令牌被认为是密钥贴片描述符,用于执行空间匹配以重新排名通过全局图像特征检索的候选。整个模型允许具有单个目标和图像级监控的端到端培训。 TransVPR在几个现实世界基准上实现最先进的性能,同时保持低计算时间和存储要求。
translated by 谷歌翻译
高斯平滑的最佳运输(GOT)框架,在Goldfeld等人开创。 (2020)并随后被一系列后续文件,在统计,机器学习,信息理论和相关领域的研究人员中迅速引起了注意。在其中做出的一个关键观察是,通过适应Get框架而不是其未平滑的对应物,可以提升用于使用经验测量来近似于近似真实数据生成分布的维度的诅咒。目前的论文表明,相关观察适用于离散指数家庭模型中非参数混合分布的估计,在Get成本下,非参数MLE的估计精度可以加速到多项式速率。这与基于无缝度量的经典子多项式速率鲜明对比,这不能从信息理论的角度来改进。我们分析中的一个关键步骤是建立高斯复杂的LipsChitz函数的新杰克逊型近似。这种洞察力弥补了分析非参数MLES和新的框架的现有技术。
translated by 谷歌翻译
大多数物体检测方法通过使用非最大抑制(NMS)及其改进版本,如Soft-NMS获取对象,这是一个很长的历史记录,以删除冗余边界框。我们从三个方面挑战那些基于NMS的方法:1)具有最高置信度值的边界框可能不是具有与地面真理盒最大的重叠的真正积极。 2)冗余盒不仅需要抑制,而且对于那些真正的阳性也需要置信度。 3)不需要置信度值排序候选盒,以便可以实现完整的并行性。在本文中,通过信仰传播(BP)的启发,我们提出了置信沟集团(CP簇)来替换基于NMS的方法,这是完全并行化的,以及精度更好。在CP-Cluster中,我们借用BP的消息传递机制来惩罚冗余框,并以迭代方式同时增强真正的阳性直到收敛。我们通过将其应用于各种主流探测器,例如FasterRCNN,SSD,FCO,YOLOV3,YOLOV5,CENTERENET等实验,验证了CP-Cluster的有效性。在MS COCO上的实验表明,我们的插头和游戏方法没有再培训探测器,都能够稳步与基于NMS的方法相比,将分别从0.2到1.9的透明边距提高所有最先进模型的平均地图。源代码在https://github.com/shenyi0220/cp-cluster中获得
translated by 谷歌翻译